20 research outputs found

    Detection of Polyps via Shape and Appearance Modeling

    Get PDF
    Presented at the MICCAI 2008 Workshop on Computational and Visualization Challenges in the New Era of Virtual Colonoscopy, September 6, 2008, New York, USA.This paper describes a CAD system for the detection of colorectal polyps in CT. It is based on stochastic shape and appearance modeling of structures of the colon and rectum, in contrast to the data-driven approaches more commonly found in the literature it derives predictive stochastic models for the features used for classification. The method makes extensive use of medical domain knowledge in the design of the models and in the setting of their parameters. The proposed approach was successfully tested on challenging datasets acquired under a protocol with little colonic preparation; such protocol reduces patient discomfort and potentially improves compliance

    Masked Vision and Language Modeling for Multi-modal Representation Learning

    Full text link
    In this paper, we study how to use masked signal modeling in vision and language (V+L) representation learning. Instead of developing masked language modeling (MLM) and masked image modeling (MIM) independently, we propose to build joint masked vision and language modeling, where the masked signal of one modality is reconstructed with the help from another modality. This is motivated by the nature of image-text paired data that both of the image and the text convey almost the same information but in different formats. The masked signal reconstruction of one modality conditioned on another modality can also implicitly learn cross-modal alignment between language tokens and image patches. Our experiments on various V+L tasks show that the proposed method not only achieves state-of-the-art performances by using a large amount of data, but also outperforms the other competitors by a significant margin in the regimes of limited training data

    Robot Navigation with Online Control

    No full text
    This work demonstrates a method of using information derived from a partially calibrated vision system to guide a mobile vehicle through free space in an environment with static obstacles. The system used consists of a mobile robot with a pair of cameras mounted on it to grab stereo images of the scene. Correspondences are determined between corners in the two images. Information derived from a one-time offline calibration is used in back projection to recover the 3D structure. This is projected on to ground parallel points and the robot moves through the available space. The ultimate goal of a navigation project is to provide an automated control for the robot. In this work, we have implemented a smaller subset of the total navigation problem. Our objective is to demonstrate the use of a simpler method, instead of existing elaborate and expensive techniques to build a robust navigation system. Our work provides a user controlled system for navigating through the free space. We have al..
    corecore